WAST HBF USB) R07 -- The VC 


Dimension 


YES: ABA ARS: AlGiG (id: redstonewill) 


BI VE REESTMATVIUESERESESJBUARUETHBCT EES ARPS. HSER 
ARBRES ARA: 
. (RISSSIDHAISIze MEZSERÉS, BUSSNIBSRABURSR, BBAAXSCBuRESIHMEE 
—^NiBiEg, Eout ~ Hs 
© TUFIEUEAMIBiBSEHr, Hbk—^g, BE; (g) ~ 0, WE, ~ 0. 
SUSAR, IERPXINDÉRtestWtrianBS4 isHER, train EM RAHA E;n (g) ~ 0 
; testi BAS EG FER USB HEAR AREA tS RISE N, BD Bou e: 0. 


EAH, ERRIA f break point, FHESWR break point, MMSE 
A, ERE Eout ~ Es 


ZEXRSSIGZEREfVAVC Dimensions. [eB tasixEVC Dimension E; (g) ~ 0 
, Eout © 0, Model Complexity Penalty (FASHE) WA. 


—. Definition of VC Dimension 


sc, Pees ER—^MBEIBHÉbreak point k, 8E 'EBSBEEEREUESS AB, È 
BS EFE ER 7jBound function, IRERE E, Bound functiontEz& 3 RBS, BLA 
N*-1, RRR, N(k — T)EGB(N.K)FASBIR A, 


m3 (N) of break point k < p» i 


—— 
highest term ^/^—' 


Nk-1 


8 16 
27 81 
16 64 256 
25 125 625 
36 216 1296 
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provably & loosely, for N > 2,k > 3, 


ma(N) < Né! 
UREE HRES, VC boundii nA ERY : 


For any g = A(D) € H and ‘statistical’ large D, foe > 3 


Pp | |Ein(9) E Eou(9)| e e] 


< Pp [ah € 4 st. |Ein(h) — Eou(h)| > e| 
ES 4m;,(2N) exp (-4&N) 

if k exists 
< 


4(2N)*-! exp (-8 2N) 


XH, ASRSKONAKTS, ACHR EAN GOA, PRUBHIRS KE. Au 
Fit: 


. 4 (kigalHBbreak point k, ANEA, RRHEVC bound#it, SZER 
B9:715857J 
© fri|igIBFHEEHE—^ Bg, EEn ~0, WE TES SESURHREUIBIXESRBMK 


if (1) my(N) breaks at k (good H) 


(2) N large enough (good D) 
=> probably generalized ‘Eout ~ Ein’, and 
if (3) A picks a g with small Ein (good A) 
=> probably learned! (:-) good luck) 


FANEN: VC Dimension, VC Dimensioni ERR SEHÉBEfZshatterflS) 
BS inputsAy TN, EURASIA. GES, REFE—HD fhüyinputs 
BEBEIERSA SS AB) . 


shatterB A SERERE ERES, tei 3 inputsBJHTE EABB TASSE. DURIXSIN 
MGA, hIERBESEAG2 IENA, RURRIZINAMISLANBERENIHIBHREH shatter, 


‘REZ Bübreak pointAYES: (Eit Betshatterf E9125 1p2s BR inputs EP 7 
44, MVC Dimension break pointh 2s. —. 


VC dimension of H, denoted dyc(?71) is 
largest N for which m,,(N) = 2% 


e the most inputs #H that can shatter 
e Ac = ‘minimum K’ - 1 


IE, FANE FERINA MoI, 'DXIIIBJVC Dimensionze£/ : 


e positive rays: m,(N)=N+1 


Qc — 1 e 

e positive intervals: my,(N) = 2N? +35N+1 
Qc — 2 ee 

* convex sets: my (N) = 2" 
dyc — oc P iiie" 

e 2D perceptrons: my(N) < N? for N > 2 
Ke =3 =o 


FBd, (Vk, ABAVC boundAyiakmthaeiRASd,FINATS. MAS, MRM 
IRRHAd, HET, MAREEN eee AESSR Eou © Ein, SAA, 
EAREN Ht BARBARA. 


finite dy; —. g ‘will’ generalize (Eou(g) ~ Ein(g)) 


e regardless of learning algorithm A 
e regardless of input distribution P 
e regardless of target function f 








training examples final hypothesis 
D: (X1, Y1), --- , (XN, YN gef 
hypothesis set 
H 


—, VC Dimension of Perceptrons 


‘worst case’ guarantee 
on generalization 





AM Fel] BUTTABR2D FAIPLA A, EXPerceptronsüSk-4, Bidu = 3. tf 
HEVC Bound##it, SANIBSEXRBSEME, Eou(g) ~ Ein(g). WREE —Ng, fi 


Ein(g) © 0, BRARTBRIERBPLAB ALAS SA. 


linearly separable D with x, ~ P and yn = f(X;) 
J v 
PLA can converge — P[|E,(g) — Eour(g)| > e] € ... by dic = 3 


T large N large 


Ein(g) — 0 Eou(g) = Ein(g) 
— 47 
Eout(g) = 0 :-) 


ZEEF, 8ERDERERZEBSPerceptron, CMAI do AEF? 
CAICET1D Perceptron, dy, = 2, 4£2D Perceptrons, dy, = 3, ARAFAT MER 
iZ: dy =d+1, BhdA4, 
SAAN, RENMEN: 

e dy >d+1 

e dye <d+1 


e 1D perceptron (pos/neg rays): dyc = 2 
e 2D perceptrons: dvc = 3 
* Ac > 3: 


e dic <3: PE 


e d-D perceptrons: dyco =d+1 


EATGUEBHSS — RSI: dy. >d+1. 


TEE, Fl REA —2S Bd 1Ninputsay LAgshatterAyis, AAMAS 
dye 2 d-F 1l. PRA, RASE AEREA Bee ishattersi(T. X Ed 
AY, d+1~Sinputs, &hinputs FSS MER AAS , (SSILX RABIEE: 


= 0.9. i U 

pus iam 1-9 22.8 

a a = 0 1 0 
". 0 

—x) Oa. 1 


FER, Arh "inputs, Ninputsfed+ 1:889, HAad+1~Ninputs, KEE 
AY X (RARE EOIAAY, shatter A ARERR AIH X ATA ATAU TABBY, 
BÜRSSEPESUBEEW, HEX *W =y, W=X ley, APB RAE X 
ANSETE, BBAAAEBSPTSinputs&liSEtSshatter, AEA T $8— 1 - SE. 


invertible 





for any y — : , find w such that 
Yd+1 


sign(Xw)=y <= (Xw)=y e w= xy 


AMERRE — RR: dye X d 1. 


TEE, VOR HaAd+2~“Ninputs, —xEABeftkshatter, MARERA. FAT 
jÉ—^ ERBBMEX, H&&dr2^inputs, izéBl&d-12J, d+24F7, ixd-24 [9 SAY 
RFE LABS d 14 I) 8E ERRZR, UBI IBIER X quio, WANN: 


Xd+2 = 01 * X1 + ag * Xo ++---+tag* X3 
Hh, [Eia 0, Q25,***,Qd « 0. 


BEAGUXQZIESS, Xo,---., X413730828, WÜUCECEW , eR Rei: 
Xq42 * W =a, * X1 x Wag * Xo x W+- - -+aq * Xa *W> 0 


AARPBREIMATO, (RIE; EWF, CARA. MAMPI, 


Xa + 2—ERIER, WHE 


AZ. eme, d+2-Ninputs7cizMshatter, 
WEA SEE! 


d-D General Case 


m 
d more rows than columns: 
—xl — 
X= : linear dependence (some a; non-zero) 
aS x - Xd+2 = &4X4 t à»Xo +... aàg41Xg41 
T 
— Xdio — 





e can you generate (sign(a;), sign(a»), .. ., sign(ag.1), x)? if so, 
what w? 
T E T T T 
W Xg42 = &W X1+a.W X2+...+48&d+1W Xd+1 
—— —=—— ———— 


o x 


> O(contradition!) 


x 


Znad = d + 1, 


=, Physical Intuition VC Dimension 


— of Freedom 


9 
1 " | KS bt i ? já "n 1 T 7 RS 
e - en 13. er O 
—4 4- —4 14- —4 14— 
ab 3 3 p SS O 
o 
va iv 


`N T4 


10 9 8 10 9 
OPE eel 










e ae 
4 14— —4 


e ii 








8 

1 da 
us \ 
18 0 


8 
£z 7 


-5 Pay -5 t -5 e 
—4 14— =>% 1— —4 14— —4 14— 
“3 157 “3 #157 
167 16° 16° 
it "d T4 
n b 


4 
iv 
^ 


10 
\ 





nodifiec vork o jf Hu 


e hypothesis parameters w = (Wo, W1,:-- , Wg): 
creates degrees of freedom 

e hypothesis quantity M = |H|: 
‘analog’ degrees of freedom 


e hypothesis ‘power’ dyc = d + 1: 
effective ‘binary’ degrees of freedom 


dyc(?1): powerfulness of H | 


EPAR PWN &features, BÜBERBHEE, EIE IERJDAERAUS TS, SU] E ErPaSbE 
tH—fE, HIEMS. VC Dimension t ET RRETARA, BDPCBR SHAS 
EE, F-*EdichotomyBSZir& , tbi SET featuresB x, (BREATH. 


practical rule of thumb: 
Qc £z #free parameters (but not always) 


90, Xj2D Perceptrons, RES, dy, = 3, WW = (wo, wi, w2}, Rein 
RE3-Meaturessty LAHT, BREN. 


MATX, Sel TAXIM Ad, EREK, MTR RIC 


M and dvc 
copied from Lecture 5 :-) 
© can we make sure that Eout(g) is close enough to Ejn(g)? 
© can we make En(g) small enough? 



















Q No!, 
P[BAD] < 2- M - exp(...) 


® Yes!, many choices 


© Yes'!, 
P[BAD] < 2- M - exp(...) 


© No!, too few choices 












© Yes!, P[BAD] < Q No!, P[BAD] < 
4 - (2N)* . exp(. ..) 4 - (2N)* . exp(. ..) 
© No!, too limited power ® Yes!, lots of power 


Mm. Interpreting VC Dimension 


TE, PATS BRAHGERITVC Dimension. 4c, HEVC Bounds 52x 
Æ: 


For any g = A(D) € H and ‘statistical’ large D, fomAk==2=dys > 2 


Pp||En(9) - Eou(g)| >| < 42M)exp(-g&N) 
—Á—MM —— € —— — € 
BAD y 


MIBZAINZURSHt, MR| Lin — Eow| > e, BIW MbadiKAyte BUSES 
Hs. BRABUS, WF goodies Al — 6, yt hag Sit 
TEMES: 


..., with probability > 1 — ô, GOOD: |Ein(g) — Eout(g)| < € 


set ô = 4(2N)%* exp (-§2N) 
amp. =  ep(-iéN) 
($2595) = Ben 





(am (E9—) = e 


ERIL T IB zSIBIHBSZ M BEZJ, eilu]N, REDEK. 


For any g = A(D) € H and ‘statistical’ large D, fo Ameo), > 2 


P»|E(g)-Ew(9|»«] | «x 4(2N)* exp (— ge?) 
I ———— en 
BAD E 








..., With probability > 1 — 6, GOOD! 
dyco 
gen. error |Fin(g) — Eout(g)| < 8 In (7 ) 


dyc dyc 
g en Eoun(9) € Ew(g)- V Rin (420 ) 





V... :penalty for model complexity 
—— 
Q(N, H, 6) | 





Zit, GSESWiZREL NWA, AARIBSAORLAF (Eou TRERIRA 
1H) , B: 


with a high probability, 





hitpSrNhaws AGL Eo, AHS SHAAN, (URSIAIH( 
dy) AR. Foy FE E dex. FARAH Eou. model complexity, Ein Edve 
SPA: 


out-of-sample error e Ac D Ein JẸ but Q T 
* Ac 4: Q | but En f 
e best dj, in the middle 







model complexity 


Error 





in-sample error 





dc VC dimension, dve 


powerful H not always good! 


x d, ERA , Ei), Oi X (GF) o 

s d, iB], Eint A, Qiii) (fa) o 

i hid, SK ' E out Sin) Bx. 
ALA, 2318598 NEUE, BE RII A duc LAM Ein, AEn AI ASEM, 
TRRUEGHAVRIZCNSUD, ERE m EA. hate, Ado, XuHEBJfeatures ZA 
Baie. 
FEMA —MES: ASAE (Sample Complexity) . MSRUEd,., HAMUED 
YeHEzE/ P AE? HS RMF LABS BEICAL TERREA : 


given specs e = 0.1, 6 = 0.1, dy; = 3, want 4(2N)% exp (2€? N) < ô 


N bound 
100 2.82 x 10’ 


1,000 9.17 x 10? sample complexity: 
10,000 1.19 x 108 need N =~ 10,0004à in theory 
100,000 1.65 x 10-98 

29,300 9.99 x 107? 
3E YPEMSSUN-29300, MUSEO = O.189 kf. NAN Gd, HY10000(%. ix^ 3X 
(EUNT, SIN PHHERREXASHEANS, ARB, HU10[fEUE T. NAY 
FICE ZPRLAXAAGAAVC Bound IFEA, RIKSEN 
BS. ERR, 


Looseness of VC Bound 


Pp ||En(9) — Eou(g)| ><] | x 4(2N)* exp (— 3c?) 


theory: N ~ 10,000dàyc; practice: N ~ 10dQ\c 


e Hoeffding for unknown Eout any distribution, any target 
e ma(N) instead of [(X1,.... Xw)| 'any' data 
e N^« instead of m;(N) ‘any’ H of same dvc 
e union bound on worst cases any choice made by A 


—but hardly better, and ‘similarly loose for all models’ 


(B13 thE, VC Bound ERA, MUI ACAREHPARR, AEN 
FAE. (He, SAMEA — AE, VC BoundzkAk DAT aie ESTATES 
EES MAI, PA, ARE LAI. M, VC Bounds 
SÉ ZJBSnTC TEX ES BAARIN. 


Bh. Bs 


APREZAS VC Dimension SIC Soxe- EH Jnon-break point, Mia, RINSE 
J Perceptrons#id##E FBSVC Dimensiond+1, EE, FOIGWIBSBML, due 
EEBEBAECK. RSh ede MESAER. ANAE, ARE 
Eon E, ARR SIIHAA RETBST 5 882J. 


EBB: 
XGSHRBPBBJEIRISSEBIESSAGEMVRE RRRA) VETE 


